FPGAs are known to permit huge gains in performance and efficiency for suitable applications but still require reduced design efforts\nand shorter development cycles for wider adoption. In this work, we compare the resulting performance of two design concepts\nthat in different ways promise such increased productivity. As common starting point, we employ a kernel-centric design approach,\nwhere computational hotspots in an application are identified and individually accelerated on FPGA. By means of a complex\nstereo matching application, we evaluate two fundamentally different design philosophies and approaches for implementing the\nrequired kernels on FPGAs. In the first implementation approach, we designed individually specialized data flow kernels in a spatial\nprogramming language for a Maxeler FPGA platform; in the alternative design approach, we target a vector coprocessor with large\nvector lengths, which is implemented as a form of programmable overlay on the application FPGAs of a Convey HC-1. We assess\nboth approaches in terms of overall system performance, raw kernel performance, and performance relative to invested resources.\nAfter compensating for the effects of the underlying hardware platforms, the specialized dataflow kernels on the Maxeler platform\nare around 3x faster than kernels executing on the Convey vector coprocessor. In our concrete scenario, due to trade-off s between\nreconfiguration overheads and exposed parallelism, the advantage of specialized dataflow kernels is reduced to around 2.5x.
Loading....